Adapting the Unisyn Lexicon to Portuguese: Preliminary issues in the development of LUPo
نویسندگان
چکیده
This paper presents some preliminary issues and proposed solutions in the development of an accent-independent pronunciation lexicon for Portuguese, known as the Portuguese Unisyn Lexicon (LUPo). LUPo's objectives are presented within the context of the Portal da Língua Portuguesa knowledge base. Key considerations are addressed for encoding morphological boundaries, treating orthographic forms, and handling loan words. Here, it is argued that the knowledge-driven paradigm exemplified in the original English Unisyn Lexicon, along with the Portal da Língua Portuguesa's relational structure and rich lexicographic content present a good foundation for establishing a tightly integrated and well informed system.
منابع مشابه
A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese
One of the major obstacles in deploying spoken language technologies (SLTs) in the developing world is a lack of key linguistic resources – e.g. electronic dictionaries, phonetically aligned corpora, pronunciation lexicons, etc. – that describe the non-dominant varieties spoken in such countries and regions. In this paper, we describe the work of the LUPo (Portuguese Unisyn Lexicon) project to ...
متن کاملThe Role of Morphology in Generating High-Quality Pronunciation Lexica for Regional Variants of Portuguese
Grapheme to phoneme (GTP) systems for languages such as English, German, and Korean have been shown to achieve better performance rates with the inclusion of a morpho-phonological preprocessing component. While semiautomatic and automatic GTP approaches for Portuguese continue to achieve steady gains, such algorithms do not take morphology into account, despite a growing need to do so, based in...
متن کاملModels of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model
Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...
متن کاملMainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کاملThe Keyword Lexicon - An accent-independent lexicon for automatic speech recognition
Recent work at the Centre for Speech Technology Research (CSTR) at the University of Edinburgh has developed an accent-independent lexicon for speech synthesis (the Unisyn project). The main purpose of this lexicon is to avoid the problems and cost of writing a new lexicon for every new accent needed for synthesis. Only recently [1], a first attempt has been made to use the Keyword Lexicon for ...
متن کامل